Topic Classification for Short Texts

نویسندگان

چکیده

In the context of TV and social media surveillance, constructing models to automate topic identification short texts is a key task. This paper constructs worth-to-consider for practical usage, employing Top-K multinomial classification methodology. We describe full data processing pipeline, discussing about dataset selection, text preprocessing, feature extraction, model selection learning, including hyperparameter optimization. will test compare popular methods including: standard machine deep fine-tuned BERT classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Segmentation for Short Texts

Topic segmentation, which aims to fmd the boundaries between topic blocks in a text, is an important task for semantic analysis of texts. Although different solutions have been proposed for the task, many limitations and difficulties exist in the approaches. In particular most of the methods do not work well for such case as short texts, internet news and student's writings. In this paper, we f...

متن کامل

BUAP: Polarity Classification of Short Texts

We report the results we obtained at the subtask B (Message Polarity Classification) of SemEval 2014 Task 9. The features used for representing the messages were basically trigrams of characters, trigrams of PoS and a number of words selected by means of a graph mining tool. Our approach performed slightly below the overall average, except when a corpus of tweets with sarcasm was evaluated, in ...

متن کامل

Classification of Short Legal Lithuanian Texts

Statistical analysis of parliamentary roll call votes is an important topic in political science because it reveals ideological positions of members of parliament (MP) and factions. However, it depends on the issues debated and voted upon. Therefore, analysis of carefully selected sets of roll call votes provides a deeper knowledge about MPs. However, in order to classify roll call votes accord...

متن کامل

Multi-value Classification of Very Short Texts

We introduce a new stacking-like approach for multi-value classification. We apply this classification scheme using Naive Bayes, Rocchio and kNN classifiers on the well-known Reuters dataset. We use part-of-speech tagging for stopword removal. We show that our setup performs almost as well as other approaches that use the full article text even though we only classify headlines. Finally, we app...

متن کامل

Topic Modeling over Short Texts by Incorporating Word Embeddings

Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this problem very well since only very limited word co-o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture notes in information systems and organisation

سال: 2023

ISSN: ['2195-4976', '2195-4968']

DOI: https://doi.org/10.1007/978-3-031-32418-5_12